You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For nnet3-xvector-get-egs I'm assuming that we don't need to worry about left or right context as we do in other binaries.
Also included in this pull request are a few improvements to the xvector objf/deriv code, such as fixing numerical overflow issues, typos, and mistakes in the comments.
Thanks!
Merging.
Please try to finish the get-egs script- including shuffling of egs.
Add the --max-jobs-run $nj option to the $cmd when shuffling to avoid overwhelming the disk (it will have $num_train_archives jobs).
For extracting the examples for the training subset and validation, you'll have to add an extra option to the python script to make the chunk-sizes deterministic rather than random (note: there will typically be one job, and --num-archives=3). The left and right chunk sizes will be identical, and they will range from min-chunk-size to max-chunk-size in a geometric pattern as you go from the first to the last archive.
For the 'archive-chunk-sizes' file you may have to add some kind of way of specifying a filename suffix or pattern so that we can get separate versions of that file for the training-subset and validation-subset egs.
Please try to finish the get-egs script- including shuffling of egs.
Will do.
For the 'archive-chunk-sizes' file you may have to add some kind of way of specifying a filename suffix or pattern so that we can get separate versions of that file for the training-subset and validation-subset egs.
If I understand this correctly, we plan on using the same utterances in train and validation (but of course, different cuts)? Edit: Nevermind, I misread that.
No, for validation the utterances are a held-out set-- see the get_egs.sh
script, it creates that subset.
You'd call that python script two times more- once for training-subset and
once for validation. And they use a different, smaller number of
frames-per-archive- again, that's drafted in the script.
Please try to finish the get-egs script- including shuffling of egs.
Will do.
For the 'archive-chunk-sizes' file you may have to add some kind of way of
specifying a filename suffix or pattern so that we can get separate
versions of that file for the training-subset and validation-subset egs.
If I understand this correctly, we plan on using the same utterances in
train and validation (but of course, different cuts)?
—
Reply to this email directly or view it on GitHub #8 (comment).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For nnet3-xvector-get-egs I'm assuming that we don't need to worry about left or right context as we do in other binaries.
Also included in this pull request are a few improvements to the xvector objf/deriv code, such as fixing numerical overflow issues, typos, and mistakes in the comments.